Hybrid Text Chunking
نویسندگان
چکیده
This paper describes a HMM-based chunk tagger and its extensions used in KRDL for the shared task of CoNLL'2000. Compared with standard HMM-based tagger, this tagger incorporates more contextual information into a lexical entry. Moreover, an error-driven learning approach is adopted to decrease the memory requirement. It keeps only positive lexical entries which contribute to the error reduction. Thus it is possible to further incorporate more contextdependent lexical entries and improve the performance. Finally, memory-based learning is integrated to further improve the performance of the chunk tagger.
منابع مشابه
A Text Chunker and Hybrid POS Tagger for Indian Languages
Part-of-Speech (POS) tagging can be described as a task of doing automatic annotation of syntactic categories for each word in a text document. This paper presents a generic hybrid POS tagger for Indian languages. Indian languages are relatively free word order, morphologically productive and agglutinative languages. In this hybrid implementation we have used combination of statistical approach...
متن کاملتعیین مرز و نوع عبارات نحوی در متون فارسی
Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...
متن کاملText Chunking by Combining Hand-Crafted Rules and Memory-Based Learning
This paper proposes a hybrid of handcrafted rules and a machine learning method for chunking Korean. In the partially free word-order languages such as Korean and Japanese, a small number of rules dominate the performance due to their well-developed postpositions and endings. Thus, the proposed method is primarily based on the rules, and then the residual errors are corrected by adopting a memo...
متن کاملEnrichir et raisonner sur des espaces sémantiques pour l'attribution de mots-clés (Enriching and reasoning on semantic spaces for keyword extraction) [in French]
Enriching and reasoning on semantic spaces for keyword extraction This article presents a multi-modular hybrid system for extraction of keywords from corpus of scientific articles. System is multi-modular because it integrates components executing transformations on 1) morphosyntactic level (lemmatization and chunking) 2) semantic level (Reflected Random Indexing), as well as upon more 3) « pra...
متن کاملChunking Clinical Text Containing Non-Canonical Language
Free text notes typed by primary care physicians during patient consultations typically contain highly non-canonical language. Shallow syntactic analysis of free text notes can help to reveal valuable information for the study of disease and treatment. We present an exploratory study into chunking such text using offthe-shelf language processing tools and pre-trained statistical models. We eval...
متن کامل